Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testnet4 including PoW difficulty adjustment fix #29775

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fjahr
Copy link
Contributor

@fjahr fjahr commented Mar 31, 2024

To supplement the ongoing conceptual discussion about a testnet reset I have drafted a move to v4 including a fix to the difficulty adjustment mechanism, which was part of the motivation that started the discussion.

Conceptual considerations:

  • The conceptual discussion about doing a testnet4 or softforking the fix into testnet3 is outside of the scope of this PR and I would ask reviewers to contribute their opinions on this on the ML instead. However, I am happy to adapt this PR to a softfork change on testnet3 if there is consensus for that instead.
  • The difficulty adjustment fix suggested here touches the CalculateNextWorkRequired function and uses the same logic used in GetNextWorkRequired to find the last previous block that was not mined with difficulty 1 under the exceptionf. An alternative fix briefly mentioned on the mailing list by Jameson Lopp would be to "restrict the special testnet minimum difficulty rule so that it can't be triggered on the block right before a difficulty retarget". That would also fix the issue but I find my suggestion here a bit more elegant.

@DrahtBot
Copy link
Contributor

DrahtBot commented Mar 31, 2024

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Code Coverage

For detailed information about the code coverage, see the test coverage report.

Reviews

See the guideline for information on the review process.

Type Reviewers
ACK jsarenik
Concept NACK kcalvinalvin
Concept ACK murchandamus
Stale ACK wiz, jlopp, russeree, craigraw, Emzy, Rob1Ham, Sjors

If your review is incorrectly listed, please react with 👎 to this comment and the bot will ignore it on the next update.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #30203 (Enhance signet chain configuration in bitcoin.conf by BrandonOdiwuor)
  • #29876 (build: add -Wundef by fanquake)
  • #29686 (Update manpage descriptions by willcl-ark)
  • #29432 (Stratum v2 Template Provider (take 3) by Sjors)
  • #28710 (Remove the legacy wallet and BDB dependency by achow101)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@murchandamus
Copy link
Contributor

Concept ACK

@maflcko
Copy link
Member

maflcko commented Apr 2, 2024

When resetting a test chain, it is also important to consider the script interpreter coverage of the current chain. Test chains are (usually) the first place to go to, to test new script primitives and protocols, as well as consensus deployments. The existing chain thus serves as a test for consensus implementations, apart from basic unit test vectors. It would be good to think how to preserve the test vectors in the chain. See also #11739 (comment) . Or if it is not needed, it would be good to say so. Maybe https://github.com/bitcoin-core/qa-assets/blob/main/unit_test_data/script_assets_test.json already covers a good portion?

Moreover, testnet is the only public chain where anyone can submit a nonstandard transaction from their laptop. Recall that policy is enforced on all networks equally (see commit e1dc15d), so getting a non-mempool transaction into a block is only possible for a miner, or by cooperating with a miner. So if the difficulty hack is removed completely, anyone wishing to submit a transaction would have to go purchase and set up mining hardware, or find a miner willing to accept the transaction. Not saying what is the best approach here, just saying that the effects should be considered and a change be done intentionally.

@ajtowns
Copy link
Contributor

ajtowns commented Apr 2, 2024

Probably we should support tracking both testnet3 and the new testnet4 for some time. Making the new code conditional on a different chain param that's only set for testnet4 would probably be the easiest way of accomplishing that?

@fjahr
Copy link
Contributor Author

fjahr commented Apr 6, 2024

Pushed some improvements and addressed some feedback. I am experimenting with some of the proposals from the mailing list and so I added Andres Poelstra's suggested difficulty adjustment with 6h/1M from here: https://groups.google.com/g/bitcoindev/c/9bL00vRj7OU/m/kFPaQCzmBwAJ

Probably we should support tracking both testnet3 and the new testnet4 for some time. Making the new code conditional on a different chain param that's only set for testnet4 would probably be the easiest way of accomplishing that?

Updated the code to introduce T4 (-testnet4) and keep T3 in place but give some deprecation warning when it's used. I guess we would keep this in place for 1 or 2 releases and then remove T3 and then let -testnet run T4.

I am using the Genesis block hash to distinguish between the two testnets. There may be cleaner solutions but I think this is ok since it would be only temporary until T3 is removed.

Moreover, testnet is the only public chain where anyone can submit a nonstandard transaction from their laptop.

Is it really realistic that someone with just their CPU would be able to mine a block with their non-standard tx on the current testnet? If the bug isn't active currently they would need to wait for it to become active and that could take weeks, right? And when it becomes active I would imagine the miner who found the first block in the difficulty=1 series just blasts the network and the CPU miner still has no chance to get a block in between.

We could revert #28354 for testnet4 if this is a feature that matters to users. Is it too much to ask that people use testmempoolaccept or run their own testnet node with -acceptnonstdtxn=0 when they want to test standardness of their tx? I would say no. And I don't see another option to solve this on layer 1 if we assume nobody changes the defaults.

It would be good to think how to preserve the test vectors in the chain.

Interesting thought. I think once there is consensus to do T4 we will find a creative solution for this. Cool would be to convert this coverage to fuzzing coverage somehow but I am not sure if that's realistic or worth the effort. Otherwise, we could write a program that looks at all the different scripts that exist on T3 and replays them on T4 or if we can compress them somehow like by filtering everything that doesn't add coverage, then we turn it into a unit test that replays the interesting scripts.

@murchandamus
Copy link
Contributor

Since some people consider the blockstorms an interesting feature of Testnet3, it might be interesting to only raise the difficulty of the delayed block exception to 100,000 instead of 1,000,000. This would allow the network to return to the organic difficulty in fewer difficulty periods and slow down the blockstorms but not remove the feature altogether. My understanding is that this would correspond roughly a tenth of one S9 mining on the network, so if no one had mined for a while, a single S9 could restart the network with ~60 s blocks, but wouldn’t churn out thousands of blocks per second.

Only allowing lower difficulty blocks after 6 hours could easily make testnet useless for extended periods, if someone put several ASICs on testnet for a while, it might prevent other users from getting confirmations for up to 6 hours. I could see an increase from the twenty minute rule to maybe an hour, but more seems counter to why the rule was introduced in the first place.

@maflcko
Copy link
Member

maflcko commented Apr 10, 2024

Is it really realistic that someone with just their CPU would be able to mine a block

Yes, I am not sure what would be the problem. All you have to do is to set the time +20min and mine a block on your laptop. If you don't want to try it yourself, you can come by to watch it on my laptop.

Since some people consider the blockstorms an interesting feature of Testnet3

After a quick chat with @murchandamus, an alternative fix would be to require the pre-retarget block to have the "correct" difficulty, so that all retarget periods are organic. The +20min hack would remain to allow a CPU to mine a few blocks, if needed, however, a block storm would be naturally limited by the +120h cut-off rule. This would limit the block storms to small block "gusts", which seems good enough to make everyone happy?

@Sjors
Copy link
Member

Sjors commented Apr 15, 2024

I spun up seed.testnet4.bitcoin.sprovoost.nl and set it to use the magic bytes and port number.

When running with -debug=net I noticed it "Added hardcoded seed" a bunch of times for i2p and onion, which makes no sense. I'm guessing those are for testnet3.


If anyone wants to deploy a faucet, let me know and I'll send some coins... unless someone reorgs me.

@luke-jr
Copy link
Member

luke-jr commented Apr 20, 2024

This seems too complicated for a testnet exception IMO. And it breaks the use case of someone testing being able to mine a block on-demand without actual mining hardware.

Shouldn't it be enough to just fix the timewarp bug?

@Sjors
Copy link
Member

Sjors commented Apr 22, 2024

it breaks the use case of someone testing being able to mine a block on-demand without actual mining hardware

I doubt many people do that. You can still set nProofOfWorkLimit to a lower value. We could add a code comment for that (or a setting). That way you can mine locally as fast as you want, without causing mayhem for others.

@maflcko
Copy link
Member

maflcko commented Apr 23, 2024

I doubt many people do that.

Two people raised the concern in this thread, so why would you doubt it?

@fjahr
Copy link
Contributor Author

fjahr commented Apr 23, 2024

Yes, I am not sure what would be the problem. All you have to do is to set the time +20min and mine a block on your laptop. If you don't want to try it yourself, you can come by to watch it on my laptop.

I missed that response, so if this is possible at any time with or without a block storm happening, I am not sure how the change here is making a difference? I will give it a try.

@fjahr
Copy link
Contributor Author

fjahr commented Apr 23, 2024

I doubt many people do that.

Two people raised the concern in this thread, so why would you doubt it?

"Many" is very relative but I think we probably would not see a market for trading testnet coins against bitcoin if that is something everyone can do as easily as setting a bitcore core node for example.

@Sjors
Copy link
Member

Sjors commented Apr 23, 2024

I am not sure how the change here is making a difference?

If it depends on the difficulty being 1 rather 1 million, that would make a difference. The two people who brought it up can definitely recompile, but maybe there's a better solution - maybe just a startup flag to override the minimum difficulty?

@maflcko
Copy link
Member

maflcko commented Apr 23, 2024

maybe just a startup flag to override the minimum difficulty?

I don't think consensus rules of remote nodes can be affected by a local startup flag (or re-compilation).

If someone wanted to create a block locally only, they could use regtest.

@sipa
Copy link
Member

sipa commented May 23, 2024

There are several BIPs that contain specifications relating to testnet, so perhaps a BIP is the right place to define testnet4? The BIP process predates testnet3, but only by a few months, so I don't think we should see the lack of a testnet3 BIP as an argument against this.

@kcalvinalvin
Copy link
Contributor

I don't think we should see the lack of a testnet3 BIP as an argument against this.

Not even a BIP but some document that specifies testnet4 besides just a PR that still might get changed.

I think in 2024 we can agree that there's more than just Bitcoin Core and asking other implementations to "read the Bitcoin Core codebase" is a ridiculous ask.

@fjahr
Copy link
Contributor Author

fjahr commented May 27, 2024

Here is a BIP PR for Testnet 4: bitcoin/bips#1601

I think the written specification needs to be a BIP to be considered meaningful in the long run. If I just put it in a gist or something like that it depends on me alone to make changes should they become necessary for example. I would rather have it be managed by the community if the written specification is what people turn to.

The PR still might get changed obviously but I will update the BIP PR accordingly.

@fjahr
Copy link
Contributor Author

fjahr commented May 28, 2024

I'm pouring one out for all the tACKs we've lost but the rebase was necessary for a possible merge.

I have addressed the comments from @Sjors and I think those were all that are in scope for this PR here. Mostly it's adding comments and two small code simplifications.

git range-diff master 06c2c713c52b60231efc3e00d2c5eb0bf9e345f9 86fea43762859478868bcca66e7ab56e8728e58f

I think the chain replay idea from @TheBlueMatt is probably best tracked in a separate issue. Potentially there are already projects out there that can provide the necessary functionality, I am not aware of anything like that though.

@fjahr
Copy link
Contributor Author

fjahr commented May 28, 2024

The CI failure doesn't seem related, somehow the test-each-commit job was instantly cancelled.

@fjahr
Copy link
Contributor Author

fjahr commented May 29, 2024

In the current implementation it's far quicker to jack up the difficulty by 1000x than it takes to drop it. I didn't do the math on the way up, the way down takes 40 weeks (4 week retarget period, each cutting difficulty in half).

The testnet4 specific code could increase nActualTimespan by n minutes for each minimum difficulty block. For n = 20 that speeds up the way down to 20 weeks (4 week retarget period, each cutting difficulty by 4). For n = 60 the network (almost) recovers in 12 weeks (4 week retarget period, each cutting difficulty by 8).

I forgot to address this comment from @Sjors earlier: I think it's an interesting idea but I am not sure about the adverse effects this could have. The base case for the network should be that we have a fluctuating but somewhat stable hashrate and a few people will use the 20-min rule to get their non-standard txs in or just to get some coins. How many these will be of the 2016 blocks, I don't know. Let's say it's 100 20-min blocks and they are always mined instantly (no time wasted; I am ignoring real 20-min blocks of which we will certainly also see a few). Then in this state when difficulty should be adjusted up (because the 100 blocks came fast) but instead the difficulty would be adjusted down because of the adjustment.

The worst case I think is that someone tries to get as many 20-min blocks as possible constantly and with that grinds down the difficulty so that we will have a much faster block time than 10 minutes. I didn't do the exact math on it to see where we end up in an equilibrium in that case but I think this would pretty annoying.

I think even if there is no attack we would end up with a faster block time on average. Maybe we don't want to assume any stability as the base case but I am still not sure the upside outweighs the downside on this. I will keep thinking about it more if I can think of more edge cases where this might lead to different outcomes than we want.

@murchandamus
Copy link
Contributor

I missed that before, but the improved text on the BIP made me realize. It looks like an attacker could jack up the difficulty by mining a few difficulty periods with an ASIC and then stop after the last block in a difficulty period. The network would then be on a difficulty some 4n higher than before, and stuck looking to mine a first block at full difficulty.

I previously understood that for the difficulty it just goes back to the latest block that has a non-1 difficulty, and didn’t realize that the first block would need to be mined at full difficulty. Is there a way to prevent the reset to minimum difficulty while allowing the 20-minute exception for every block? Would that require something like @ajtowns’s storing the actual difficulty in the version or similar?

@jsarenik
Copy link

Tested ACK 86fea43

@fjahr
Copy link
Contributor Author

fjahr commented May 29, 2024

I missed that before, but the improved text on the BIP made me realize. It looks like an attacker could jack up the difficulty by mining a few difficulty periods with an ASIC and then stop after the last block in a difficulty period. The network would then be on a difficulty some 4n higher than before, and stuck looking to mine a first block at full difficulty.

I previously understood that for the difficulty it just goes back to the latest block that has a non-1 difficulty, and didn’t realize that the first block would need to be mined at full difficulty. Is there a way to prevent the reset to minimum difficulty while allowing the 20-minute exception for every block? Would that require something like @ajtowns’s storing the actual difficulty in the version or similar?

Yes, this has also been described here by AJ.

@Sjors suggestion to change nActualTimespan was also aimed at this problem but I have a few doubts outlined above. I think @ajtowns suggestion to use the version field is the only complete solution for this problem so far.

@murchandamus
Copy link
Contributor

Yes, this has also been described here by AJ.

I must have read that before it was edited and missed the edit.

@Sjors suggestion to change nActualTimespan was also aimed at this problem but I have a few doubts outlined above. I think @ajtowns suggestion to use the version field is the only complete solution for this problem so far.

Thanks for keeping the overview! :)
I guess if someone were to troll testnet in that manner, it might have a social solution, where we could probably convince some ASIC-directing individual to point some hashpower at Testnet for one block per difficulty period until the difficulty recovers to reasonable heights. Otherwise, it might be a good time to reset again. ;)

@ajtowns
Copy link
Contributor

ajtowns commented May 30, 2024

I missed that before, but the improved text on the BIP made me realize. It looks like an attacker could jack up the difficulty by mining a few difficulty periods with an ASIC and then stop after the last block in a difficulty period. The network would then be on a difficulty some 4n higher than before, and stuck looking to mine a first block at full difficulty.

I don't think that's much of a concern -- all you'd need to do is invalidateblock the last block of the period, mine a new one with a much later timestamp, and then mine another block in the new period, that no longer has a 4x increased difficulty. At that point your new chain has more work than the old chain, and you continue from there with normal difficulty.

An attacker with 50x more hashpower than everyone else combined could conceivably rush 2015 blocks in ~6 hours, leaving the chain stalled for two weeks, and potentially repeat that attack as often as they liked, but that's probably a fair amount of hashpower to dedicate to griefing testnet, at which point switching to signet or spinning up testnet5 would presumably make sense.

@TheBlueMatt
Copy link
Contributor

In practice today that probably means pre-mining a handful of blocks to add the test cases and then checkpointing the end of the pre-mine.

That makes sense if we want the test cases as early as possible in the chain. In that case someone should provide such a script in the next few months, then we can make a new genesis block, run the script, add the checkpoint and merge this PR.

I do hope we can rid of the checkpoint code entirely (see #25725), but maybe by then testnet4 will have enough work on it that a reorg is unlikely.

It also seems to me that its not crazy to have a testnet-specific validity rule that isn't "the checkpoint code" :).

@Sjors
Copy link
Member

Sjors commented May 30, 2024

@Sjors suggestion to change nActualTimespan was also aimed at this problem

No it wasn't, it seems I was confused myself and thought there was no problem: #29775 (comment)

@murchandamus
Copy link
Contributor

murchandamus commented May 30, 2024

I missed that before, but the improved text on the BIP made me realize. It looks like an attacker could jack up the difficulty by mining a few difficulty periods with an ASIC and then stop after the last block in a difficulty period. The network would then be on a difficulty some 4n higher than before, and stuck looking to mine a first block at full difficulty.

I don't think that's much of a concern -- all you'd need to do is invalidateblock the last block of the period, mine a new one with a much later timestamp, and then mine another block in the new period, that no longer has a 4x increased difficulty. At that point your new chain has more work than the old chain, and you continue from there with normal difficulty.

I think that could reduce the difficulty by up to a factor of 16 (if you are willing to wait up to eight weeks), but I don’t see how someone needing to manually intervene and most likely still needing an ASIC mitigates the potential liveness issue here.

An attacker with 50x more hashpower than everyone else combined could conceivably rush 2015 blocks in ~6 hours, leaving the chain stalled for two weeks, and potentially repeat that attack as often as they liked, but that's probably a fair amount of hashpower to dedicate to griefing testnet, at which point switching to signet or spinning up testnet5 would presumably make sense.

I’m not concerned with an attacker that has 50× more hashpower. If we were in a situation where Testnet has no ASICs mining it and someone points a millionth of the mainnet hashpower at Testnet (today about 650 TH/s), they could easily mine 10 difficulty periods in minutes and put the first block of the next difficulty period well out of the range of non-ASICs. If someone points more hashrate at it, e.g. in the range of a thousandth of mainnet, it could easily shoot up the difficulty even out of the range of a small number of S9s, which nominally have 14 TH/s.

As far as I recall, that’s exactly the problem we had with Testnet 1 and 2 that lead to the 20-minute exception being introduced in the first place: mining pools occasionally test their setups on Testnet and cause the difficulty to shoot up like crazy.

@boring877

This comment was marked as off-topic.

@fjahr
Copy link
Contributor Author

fjahr commented May 31, 2024

If we were in a situation where Testnet has no ASICs mining it and someone points a millionth of the mainnet hashpower at Testnet (today about 650 TH/s), they could easily mine 10 difficulty periods in minutes and put the first block of the next difficulty period well out of the range of non-ASICs.

The hashpower on Testnet3 seems to have been fluctuating around ~500 TH/s over the past 2 years: https://mempool.space/testnet/graphs/mining/hashrate-difficulty If that data is correct, I am not so concerned about 650 TH/s but I am not sure if the 20-min exception blocks mess with those statistics.

Why do you think this issue has never happened on Testnet3? Someone can run up the difficulty there today like you describe and leave on the last block of a difficulty adjustment period, the chain would stall the same as with the code here.

I wrote a little script to check how many adjustment blocks took longer than 20min and what the most extreme cases were. It doesn't look too bad honestly, over these 13 years we have had a handful that took more than an hour, two over 5h, but I would have expected to find worse. The delta is just comparing the timestamp to the previous block, so granted, there could be some shenanigans going on and that's not the real delta but even if the prev block actually had a timestamp 2 hours in the future for example, it still doesn't seem too terrible to me to have these few outliers over 13 years.

See logs
Height: 16128, Time Difference: 1201 seconds
Height: 20160, Time Difference: 1338 seconds
Height: 26208, Time Difference: 1613 seconds
Height: 32256, Time Difference: 2430 seconds
Height: 38304, Time Difference: 1843 seconds
Height: 92736, Time Difference: 1742 seconds
Height: 108864, Time Difference: 1379 seconds
Height: 127008, Time Difference: 19999 seconds
Height: 205632, Time Difference: 3426 seconds
Height: 225792, Time Difference: 2892 seconds
Height: 243936, Time Difference: 2509 seconds
Height: 270144, Time Difference: 8494 seconds
Height: 276192, Time Difference: 1410 seconds
Height: 278208, Time Difference: 5749 seconds
Height: 308448, Time Difference: 2898 seconds
Height: 314496, Time Difference: 1884 seconds
Height: 316512, Time Difference: 8331 seconds
Height: 318528, Time Difference: 3992 seconds
Height: 322560, Time Difference: 5553 seconds
Height: 328608, Time Difference: 5121 seconds
Height: 374976, Time Difference: 1428 seconds
Height: 530208, Time Difference: 3827 seconds
Height: 546336, Time Difference: 2645 seconds
Height: 921312, Time Difference: 1830 seconds
Height: 925344, Time Difference: 1664 seconds
Height: 1034208, Time Difference: 1205 seconds
Height: 1116864, Time Difference: 3558 seconds
Height: 1179360, Time Difference: 2390 seconds
Height: 1288224, Time Difference: 1469 seconds
Height: 1296288, Time Difference: 2198 seconds
Height: 1354752, Time Difference: 2686 seconds
Height: 1518048, Time Difference: 2423 seconds
Height: 1572480, Time Difference: 1607 seconds
Height: 1576512, Time Difference: 3231 seconds
Height: 1578528, Time Difference: 4781 seconds
Height: 1580544, Time Difference: 1757 seconds
Height: 1665216, Time Difference: 4310 seconds
Height: 1721664, Time Difference: 2089 seconds
Height: 1830528, Time Difference: 1461 seconds
Height: 1901088, Time Difference: 1672 seconds
Height: 1903104, Time Difference: 2170 seconds
Height: 1971648, Time Difference: 2227 seconds
Height: 1975680, Time Difference: 1683 seconds
Height: 2034144, Time Difference: 1615 seconds
Height: 2096640, Time Difference: 1849 seconds
Height: 2098656, Time Difference: 1940 seconds
Height: 2102688, Time Difference: 1495 seconds
Height: 2134944, Time Difference: 1494 seconds
Height: 2163168, Time Difference: 1359 seconds
Height: 2225664, Time Difference: 3482 seconds
Height: 2245824, Time Difference: 1852 seconds
Height: 2314368, Time Difference: 2212 seconds
Height: 2342592, Time Difference: 1461 seconds
Height: 2348640, Time Difference: 1685 seconds
Height: 2419200, Time Difference: 2266 seconds
Height: 2421216, Time Difference: 1885 seconds
Height: 2425248, Time Difference: 1477 seconds
Height: 2431296, Time Difference: 18461 seconds
Height: 2530080, Time Difference: 1603 seconds
Height: 2534112, Time Difference: 1884 seconds
Height: 2538144, Time Difference: 1213 seconds
Height: 2578464, Time Difference: 3759 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet